01A isolates experiments

if (run_scripts) source("script/01A-isolates_experiments-01-metadata.R")

Isolate IDs

68 isolates were used in the pairwise competition assay. The csv contains:

  • ExpID: isolation ID used by Djordje. The first and second numbers indicate inoculum and replicates in Goldford2018.
  • ID: Sanger sequencing ID. Unique ID for sequenced isolates from Sanchez lab.
  • Community: community ID used in the competition assay.
  • Isolate: Isolates ID within a community ID, used in the competition assay.
isolates_ID_match <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_ID_match.csv", col_types = cols())
isolates_ID_match

Community metadata

Community IDs and basic information

13 self-assembled communities, two across-community assembly, two random assmebly

communities <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/communities.csv", col_types = cols())
communities

01B isolates sanger sequencing

Read isolate 16S sequences

Read and merge ab1 files

This script reads the raw ab1 file from Sanger sequences and merges the forward and backward sequences. It may take ~20 minutes to read all 420 sequences from 210 isolates.

#' This script takes ~10mins to run
#' Do not run this script if `isolates_16S` is already available. 
if (run_scripts) source("script/01B-isolate_sanger_seq-01-read_align.R")

These isolates include Djordje’s isolates from simplicity paper, and Jean’s isolates from random environments. - ID: Sanger sequencing ID. - Sequence: Sanger sequence. - ConsensusLength:

isolates_16S <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_16S.csv", col_types = cols())
isolates_16S

Taxonomy classifier

Use RDP taxonomy classifier to determine the isolates’ taxonomy. Assign fermenter/respirator and gram positive/negative according to families.

# This may take a few seconds
if (run_scripts) source("script/01B-isolate_sanger_seq-02-RDP.R")
  • ID
  • Fermenter: Fermenting sugar or not accroding to the taxanomy
  • GramPositive: gram positive or negative.
  • Family
  • Genus
  • GenusScore
  • Sequence

Multiple sequence alignment and make tree

Save the multiple sequence alignment result, ggtree-compatible tree, and the isolate RDP information with the tree in data/temp/isolates_sanger_seq.Rdata.

  • aln
  • aln2
  • tree
  • isolates_seq
if (run_scripts) source("script/01B-isolate_sanger_seq-03-make_tree.R")
load("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_sanger_seq.Rdata")
isolates_seq
DNAStringSet object of length 68:
     width seq                                                                                         names               
 [1]   657 TACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCC...TTAATTCGATGCAACGCGAAGAACCTTACCTACTCTTGACATCC 1.2.A.1
 [2]   826 TACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCC...TTACCAGCACGTTATGGTGGGCACTCTAAGGAGACTGCCGGTGA 1.4.A.1
 [3]   525 GACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAG...CGATGCAACGCGAAGAACCTTACCTACTCTTGACATCCAGAGAA 1.4.A.3
 [4]   790 TACGGGAGGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCC...GTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGC 1.4.B.4
 [5]   741 TCACCAAGGCGACGATCCGTAACTGGTCTGAGAGGATGATCAGT...AGCAACGCGAAGAACCTTACCAGGCCTTGACATGCAGAGAACTT 1.6.A.1
 ...   ... ...
[64]   716 TCCCTAGCTGGTCTGAGAGGATGACCAGCCACACTGGAACTGAG...TTTAATTCGATGCAACGCGAAGAACCTTACCTACTCTTGACATC 11.2.B.8.2
[65]   838 TGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGAC...TTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCG 11.1.C.1
[66]   676 TACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCC...AGAACCTTACCAGGCCTTGACATGCAGAGAACTTTCCAGAGATG 11.5.B.2
[67]   772 NNNNNNNNNNNNNNNNNNNNNTGCAANTCGAGCGGCAGCGGGAA...NGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGGTANN 2.6.A.5
[68]   803 NNNNNNGNNNNNNNNNNGCAGTCGAGCGGTAGCACAGGGAGCTT...GTCCACGCCGTAAACGATGTCGACTTGGNNNTTGTGCCCNTGNN 10.2.C.4

Plot basic tree

01C isolates OD and CFU conversion

Read OD (isolates and pairs)

Read the OD data generated from the analysis script for each experiment batch. The table below is the communities in each experimental batch.

Experimental batch Community
B2 C11R1 (except for isolate 1), C10R2, C2R6, C2R8, C7R1, C8R4
C C11R1 isolate 1 (the OD data is in OD_C2.csv)
C2 C11R2
D2 C11R5, C1R2, C1R4, C1R6, C1R7, C4R1

Read the OD files and export an overall OD data in csv output/OD.csv and an isolate OD csv output/isolates_OD.csv

if (run_scripts) source("script/01C-isolates_OD_CFU-01-read_OD.R")

Isolate OD at each transfer. Abs values have been averaged out among replicates.

There are 68 isolates used in the pairwise competition experiments.

Write all OD data in data/temp/OD.csv and isolate OD in data/temp/isolates_OD.csv

isolates_OD <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_OD.csv", col_types = cols())
isolates_OD

Read isolates CFU

if (run_scripts) source("script/01C-isolates_OD_CFU-02-read_CFU.R")

Colony counts at the given dilution factors.

There are 68 isolates used in the pairwise competition experiments.

Write isolate OD and CFU in data/temp/isolates_OD_CFU.csv

Epsilon: OD-CFU conversion coefficient

if (run_scripts) source("script/01C-isolates_OD_CFU-03-epsilon.R")

To determine the outcomes of pairwise competition, I compare the frequencies changes between T1 and T8, which were empirically measured at different approaches and have to be converted into the same unit for comparison.

  • T1 frequencies are determined by mixing two isolate inocula with the standardized OD (OD620 = 0.1) at three initial frequencies: 5:95, 50:50, 95:5. T1 frequencies are thus OD frequencies.

  • T8 frequencies are determined by plating the mature media (48 hour growth cycle) on rich agar plates and counting the colony of each isolate. T8 frequencies are thus CFU frequencies.

In this section, I convert the OD frequencies at T1 to CFU frequencies. For each isolate, the CFU and OD have the following relationships.

\[CFU_A = \frac{OD_A DF_A v_A}{\epsilon_A}\]

\[CFU_A = \epsilon_A OD_A DF_A v_A\]

where \(OD\) is the measured OD at 620 nm, \(DF\) is the dilution factor (\(10^4\) or \(10^5\)), \(v\) is the plating volume (20 uL for all experiment) \(\epsilon\) is the conversion coefficient between OD and CFU.

isolates_epsilon <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_epsilon.csv", col_types = cols())
isolates_epsilon

01D isolates growth rates

if (run_scripts) source("script/01D-isolates_growth_rate-01-read_data.R")
isolates_growth_traits <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_growth_traits.csv", col_type = cols())
isolates_growth_traits

isolates_growth_traits contains many columns. CS is a wildcard for any carbon source measured. - r_CS_*hr: growth rates on carbon sources at 12, 16, 28, 48 hr, measured from the growth curve data. Data from Sylvie data/raw/growth_rate/raw_gcurves_all_sylvies.csv. - X_CS_*hr: secreted acids (acetate, lactate, succinate) amount at 0, 16, 28, 48 hr. Data from Sylvie data/raw/growth_rate/Estrela_2021_isolates_ph_OAs.csv - pH_*hr: pH measured at 0, 16, 28, 48 hr. Data from Sylvie data/raw/growth_rate/Estrela_2021_isolates_ph_OAs.csv. - PreferredCS and SecretedCS: acid preference determined by the sequence of acid used, manually corrected by eyeballing the figure 01D-byproduct_conc.png. Data from Sylvie data/raw/growth_rate/Estrela_2021_isolates_ph_OAs.csv. - ByproductSum_*hr: sum of acids produced at 0, 16, 28, 48 hr. Data from Sylvie data/raw/growth_rate/raw_gcurves_all_sylvies.

01E match isolates to ESV for community abundanuce

Read community ESV sequences

if (run_scripts) source("script/01E-match_community_abundance-01-community_sequence.R")

OTU table output communities_abundance comes from the curated community ESV table from Sylvie.

I formatted the table so that it has the following variables:

Match community ESVs and isolate Sanger sequences

# It may take 10 mins
if (run_scripts) source("script/01E-match_community_abundance-02-match_isolate_16S.R")

IUPAC notation for DNA base pairs

W, S, M, K, R, Y represent two possibilities for one base pair B, D, H, V represent three possiblities N means any nucleotides (but not a gap)

I tested five alignment methods in Biostrings::pairwiseAlignment()

sequences_alignment <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/sequences_alignment.csv", col_type = cols())
sequences_alignment

Relative abundance explained by the sequences

if (run_scripts) source("script/01E-match_community_abundance-03-matched_abundance.R")

ESV abundances of 13 communities used in the competition assay

plot_abundance <- function(df, label_x = "Community", label_y = "RelativeAbundance", fill = "CommunityESVID") {
    ggplot(df) +
        geom_bar(aes_string(x = label_x, y = label_y, fill = fill),
                 position = "stack", stat = "identity", col = 1) +
        theme_bw() +
        theme(axis.text.x = element_text(angle = 90)) +
        scale_y_continuous(expand = c(0, 0))
}
communities_abundance <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/communities_abundance.csv", col_types = cols())
communities_abundance %>% plot_abundance(label_x = "Community", label_y = "RelativeAbundance", fill = "ESVFamily")

Abundances of ESVs matched to Sangers

isolates_abundance %>%
    left_join(isolates_RDP) %>%
    mutate(Community = ordered(Community,  communities$Community)) %>%
    plot_abundance(label_x = "Community", label_y = " RelativeAbundance", fill = "Family") +
    labs(x = "", y = "Relative abundance") +
    scale_fill_manual(values = setNames(color_sets$Color, color_sets$Family)) +
    scale_x_discrete(expand=c(0,0)) +
    scale_y_continuous(expand=c(0,0), limits = c(0,1), breaks = seq(0,1, .25)) 
Joining, by = "ExpID"
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
Warning: Removed 20 rows containing missing values (position_stack).

Summary

Combine and generate the isolate metadata file

if (run_scripts) source("script/01-isolates-01-read_match_isolate.R")

Isolates information in one data.frame isolates. 68 isolates

isolates <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/isolates.csv", col_types = cols())
isolates

Plot isolates’ functions (Thies section is deprecated)

Growth rates on varied CSs

isolates %>%
    select(ID, Fermenter, starts_with("r_")) %>%
    pivot_longer(cols = c(-Fermenter, -ID), names_prefix = "r_", names_to = "CS", values_to = "r") %>%
    mutate(CS = factor(CS, c("glucose", "acetate", "lactate", "succinate", "gluconate", "ketogluconate"))) %>%
    filter(!is.na(Fermenter)) %>%
    ggplot() +
    geom_histogram(aes(x = r, fill = Fermenter), color = 1, binwidth = 0.005) +
    facet_wrap(.~CS, nrow = 3, scales = "free_y") +
    theme_classic()
Warning: Removed 846 rows containing non-finite values (stat_bin).

OD at 16hr on varied CSs

isolates %>%
    select(ID, Fermenter, starts_with("OD")) %>%
    pivot_longer(cols = c(-Fermenter, -ID), names_prefix = "OD620_16h_", names_to = "CS", values_to = "r") %>%
    mutate(CS = factor(CS, c("glucose", "acetate", "succinate", "citrate", "fructose", "arabinose", "galactose", "ribose", "glycerol", "pyruvate", "malate", "fumarate"))) %>%
    filter(!is.na(Fermenter)) %>%
    ggplot() +
    geom_histogram(aes(x = r, fill = Fermenter), color = 1, binwidth = 0.01) +
    facet_wrap(.~CS, nrow = 4, scales = "free_y") +
    theme_classic()
Warning: Removed 720 rows containing non-finite values (stat_bin).

Acids production

isolates %>%
    select(ID, Fermenter, starts_with("X_"), starts_with("pH")) %>%
    pivot_longer(cols = c(-Fermenter, -ID), names_prefix = "X_", names_sep = "_", names_to = c("CS", "Time"), values_to = "Amount") %>%
    mutate(Time = sub("hr", "", Time)) %>%
    mutate(CS = factor(CS, c("pH", "acetate", "lactate", "succinate"))) %>%
    filter(!is.na(Fermenter)) %>%
    ggplot() +
    geom_line(aes(x = Time, y = Amount, group = ID, color = Fermenter)) +
    facet_wrap(.~CS, nrow = 2, scales = "free_y") +
    theme_classic()
Warning: Removed 180 row(s) containing missing values (geom_path).

---
title: "Analysis on isolates used in pairwise competitions"
author: "Chang-Yu Chang"
date: "`r Sys.Date()`"
output:
  html_notebook:  
    toc: yes
    number_sections: no
---

```{r setup, include = F}
knitr::opts_chunk$set(cache = FALSE, echo = TRUE)
library(tidyverse)
run_scripts <- F
```


# 01A isolates experiments

```{r}
if (run_scripts) source("script/01A-isolates_experiments-01-metadata.R")
```


## Isolate IDs

68 isolates were used in the pairwise competition assay. The csv contains:

- `ExpID`: isolation ID used by Djordje. The first and second numbers indicate inoculum and replicates in Goldford2018.
- `ID`: Sanger sequencing ID. Unique ID for sequenced isolates from Sanchez lab.
- `Community`: community ID used in the competition assay.
- `Isolate`: Isolates ID within a community ID, used in the competition assay.


```{r}
isolates_ID_match <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_ID_match.csv", col_types = cols())
isolates_ID_match
```

## Community metadata

Community IDs and basic information

13 self-assembled communities, two across-community assembly, two random assmebly

```{r}
communities <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/communities.csv", col_types = cols())
communities
```

# 01B isolates sanger sequencing

-   Read and align isolate 16S sequences. 1) Read raw `.ab1` file and 2) Merge F and R sequences.
-   Classify isolate taxonomy using RDP classifier. `rRDP`
-   Align seqeunces using MUSCLE. `msa`
-   Build phylogenetic tree by using FastTree or RAxML. `ggtree`

## Read isolate 16S sequences

Read and merge ab1 files

This script reads the raw ab1 file from Sanger sequences and merges the forward and backward sequences. It may take \~20 minutes to read all 420 sequences from 210 isolates.

```{r}
#' This script takes ~10mins to run
#' Do not run this script if `isolates_16S` is already available. 
if (run_scripts) source("script/01B-isolate_sanger_seq-01-read_align.R")
```

These isolates include Djordje's isolates from simplicity paper, and Jean's isolates from random environments.
- `ID`: Sanger sequencing ID.
- `Sequence`: Sanger sequence.
- `ConsensusLength`: 

```{r}
isolates_16S <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_16S.csv", col_types = cols())
isolates_16S
```

## Taxonomy classifier

Use RDP taxonomy classifier to determine the isolates' taxonomy. Assign fermenter/respirator and gram positive/negative according to families. 

```{r}
# This may take a few seconds
if (run_scripts) source("script/01B-isolate_sanger_seq-02-RDP.R")
```

- `ID`
- `Fermenter`: Fermenting sugar or not accroding to the taxanomy
- `GramPositive`: gram positive or negative.
- `Family`
- `Genus`
- `GenusScore`
- `Sequence`

```{r}
isolates_RDP <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_RDP.csv", col_types = cols())
isolates_RDP
```

## Multiple sequence alignment and make tree

Save the multiple sequence alignment result, ggtree-compatible tree, and the isolate RDP information with the tree in `data/temp/isolates_sanger_seq.Rdata`.

- `aln`
- `aln2`
- `tree`
- `isolates_seq`

```{r}
if (run_scripts) source("script/01B-isolate_sanger_seq-03-make_tree.R")
```

```{r}
load("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_sanger_seq.Rdata")
isolates_seq
```


Plot basic tree

```{r fig.width=10, fig.height=10}
ggtree(tree) + ggplot2::xlim(0, 0.5)
```



# 01C isolates OD and CFU conversion

-   Read ODs of isolates and pairs that were measured in pairwise competition experiment. `isolates_OD.csv`
-   Read isolates' CFU on monoculture `isolates_CFU_T8.csv`
-   Match isolates' OD and CFU `isolates_OD_CFU.csv`
-   Calculate uncertainty in epsilon using error propagation theory

## Read OD (isolates and pairs)

Read the OD data generated from the analysis script for each experiment batch. The table below is the communities in each experimental batch.

| Experimental batch | Community                                                   |
|--------------------|-------------------------------------------------------------|
| B2                 | C11R1 (except for isolate 1), C10R2, C2R6, C2R8, C7R1, C8R4 |
| C                  | C11R1 isolate 1 (the OD data is in `OD_C2.csv`)             |
| C2                 | C11R2                                                       |
| D2                 | C11R5, C1R2, C1R4, C1R6, C1R7, C4R1                         |

Read the OD files and export an overall OD data in csv `output/OD.csv` and an isolate OD csv `output/isolates_OD.csv`

```{r}
if (run_scripts) source("script/01C-isolates_OD_CFU-01-read_OD.R")
```

Isolate OD at each transfer. Abs values have been averaged out among replicates.

There are 68 isolates used in the pairwise competition experiments.

Write all OD data in `data/temp/OD.csv` and isolate OD in `data/temp/isolates_OD.csv`

```{r}
isolates_OD <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_OD.csv", col_types = cols())
isolates_OD
```

## Read isolates CFU

```{r}
if (run_scripts) source("script/01C-isolates_OD_CFU-02-read_CFU.R")
```

Colony counts at the given dilution factors.

```{r}
isolates_OD_CFU <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_OD_CFU.csv", col_type = cols())
isolates_OD_CFU
```

There are 68 isolates used in the pairwise competition experiments.

Write isolate OD and CFU in `data/temp/isolates_OD_CFU.csv`

## Epsilon: OD-CFU conversion coefficient

```{r warning = F}
if (run_scripts) source("script/01C-isolates_OD_CFU-03-epsilon.R")
```

To determine the outcomes of pairwise competition, I compare the frequencies changes between T1 and T8, which were empirically measured at different approaches and have to be converted into the same unit for comparison.

-   T1 frequencies are determined by mixing two isolate inocula with the standardized OD (OD620 = 0.1) at three initial frequencies: 5:95, 50:50, 95:5. T1 frequencies are thus OD frequencies.

-   T8 frequencies are determined by plating the mature media (48 hour growth cycle) on rich agar plates and counting the colony of each isolate. T8 frequencies are thus CFU frequencies.

In this section, I convert the OD frequencies at T1 to CFU frequencies. For each isolate, the CFU and OD have the following relationships.

$$CFU_A = \frac{OD_A DF_A v_A}{\epsilon_A}$$

$$CFU_A = \epsilon_A OD_A DF_A v_A$$

where $OD$ is the measured OD at 620 nm, $DF$ is the dilution factor ($10^4$ or $10^5$), $v$ is the plating volume (20 uL for all experiment) $\epsilon$ is the conversion coefficient between OD and CFU.

```{r}
isolates_epsilon <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_epsilon.csv", col_types = cols())
isolates_epsilon
```

# 01D isolates growth rates


```{r}
if (run_scripts) source("script/01D-isolates_growth_rate-01-read_data.R")
```


```{r}
isolates_growth_traits <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_growth_traits.csv", col_type = cols())
isolates_growth_traits
```

`isolates_growth_traits` contains many columns. CS is a wildcard for any carbon source measured.
    - `r_CS_*hr`: growth rates on carbon sources at 12, 16, 28, 48 hr, measured from the growth curve data. Data from Sylvie `data/raw/growth_rate/raw_gcurves_all_sylvies.csv`.
    - `X_CS_*hr`: secreted acids (acetate, lactate, succinate) amount at 0, 16, 28, 48 hr. Data from Sylvie `data/raw/growth_rate/Estrela_2021_isolates_ph_OAs.csv`
    - `pH_*hr`: pH measured at 0, 16, 28, 48 hr. Data from Sylvie `data/raw/growth_rate/Estrela_2021_isolates_ph_OAs.csv`.
    - `PreferredCS` and `SecretedCS`: acid preference determined by the sequence of acid used, manually corrected by eyeballing the figure `01D-byproduct_conc.png`. Data from Sylvie `data/raw/growth_rate/Estrela_2021_isolates_ph_OAs.csv`.
    - `ByproductSum_*hr`: sum of acids produced at 0, 16, 28, 48 hr. Data from Sylvie  `data/raw/growth_rate/raw_gcurves_all_sylvies`.



# 01E match isolates to ESV for community abundanuce


Read community ESV sequences

```{r}
if (run_scripts) source("script/01E-match_community_abundance-01-community_sequence.R")
```

OTU table output `communities_abundance` comes from the curated community ESV table from Sylvie.

I formatted the table so that it has the following variables:

-   `SampleID`: experiment ID used by Nanxi or Sylvie.
-   `Community`: community, for instance, C2R4 or C1Rpool.
-   `Transfer`: the transfer when the community was sequenced.
-   `Adundance`: the number of this sequence in the community.
-   `RelativeAbundance`
-   `CommunityESVID`: the sequence identifier. This identifier is community-sequence specific.
-   `ESV`: the DNA sequence in 16s V4 region.
-   `CarbonSource`: the carbon source used for assembly
-   `Order`, `Family`, and `Genus`: SILVA assigned taxonomy


```{r echo = T}
communities_abundance <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/communities_abundance.csv", col_types = cols())
communities_abundance
```


## Match community ESVs and isolate Sanger sequences

```{r}
# It may take 10 mins
if (run_scripts) source("script/01E-match_community_abundance-02-match_isolate_16S.R")
```

IUPAC notation for DNA base pairs

W, S, M, K, R, Y represent two possibilities for one base pair B, D, H, V represent three possiblities N means any nucleotides (but not a gap)

I tested five alignment methods in `Biostrings::pairwiseAlignment()`

```{r echo = T}
sequences_alignment <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/sequences_alignment.csv", col_type = cols())
sequences_alignment
```


## Relative abundance explained by the sequences

```{r message = F, warning = F}
if (run_scripts) source("script/01E-match_community_abundance-03-matched_abundance.R")
```

ESV abundances of 13 communities used in the competition assay

```{r fig.width = 5, fig.height = 3}
plot_abundance <- function(df, label_x = "Community", label_y = "RelativeAbundance", fill = "CommunityESVID") {
    ggplot(df) +
        geom_bar(aes_string(x = label_x, y = label_y, fill = fill),
                 position = "stack", stat = "identity", col = 1) +
        theme_bw() +
        theme(axis.text.x = element_text(angle = 90)) +
        scale_y_continuous(expand = c(0, 0))
}
communities_abundance <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/communities_abundance.csv", col_types = cols())
communities_abundance %>% plot_abundance(label_x = "Community", label_y = "RelativeAbundance", fill = "ESVFamily")
```

Abundances of ESVs matched to Sangers

```{r}
isolates_abundance <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_abundance.csv", col_types = cols())
isolates_RDP <- read_csv("~/Dropbox/lab/emergent-coexistence/data/temp/isolates_RDP.csv", col_types = cols())
# Color sets from Sylvie
color_sets <- tibble(Color = c("yellow", "deepskyblue3", "blue", "darkorchid2", "firebrick", "orange2", "grey"),
  Family = c("Aeromonadaceae", "Enterobacteriaceae", "Moraxellaceae", "Pseudomonadaceae","Comamonadaceae","Alcaligenaceae", "Sphingobacteriaceae"))

isolates_abundance %>%
    left_join(isolates_RDP) %>%
    mutate(Community = ordered(Community,  communities$Community)) %>%
    plot_abundance(label_x = "Community", label_y = " RelativeAbundance", fill = "Family") +
    labs(x = "", y = "Relative abundance") +
    scale_fill_manual(values = setNames(color_sets$Color, color_sets$Family)) +
    scale_x_discrete(expand=c(0,0)) +
    scale_y_continuous(expand=c(0,0), limits = c(0,1), breaks = seq(0,1, .25)) 

```


# Summary

## Combine and generate the isolate metadata file

```{r}
if (run_scripts) source("script/01-isolates-01-read_match_isolate.R")
```

Isolates information in one data.frame `isolates`. 68 isolates

```{r}
isolates <- read_csv("~/Dropbox/lab/emergent-coexistence/data/output/isolates.csv", col_types = cols())
isolates
```





## Plot isolates' functions (Thies section is deprecated)

### Growth rates on varied CSs

```{r}
isolates %>%
    select(ID, Fermenter, starts_with("r_")) %>%
    pivot_longer(cols = c(-Fermenter, -ID), names_prefix = "r_", names_to = "CS", values_to = "r") %>%
    mutate(CS = factor(CS, c("glucose", "acetate", "lactate", "succinate", "gluconate", "ketogluconate"))) %>%
    filter(!is.na(Fermenter)) %>%
    ggplot() +
    geom_histogram(aes(x = r, fill = Fermenter), color = 1, binwidth = 0.005) +
    facet_wrap(.~CS, nrow = 3, scales = "free_y") +
    theme_classic()
```

### OD at 16hr on varied CSs

```{r}
isolates %>%
    select(ID, Fermenter, starts_with("OD")) %>%
    pivot_longer(cols = c(-Fermenter, -ID), names_prefix = "OD620_16h_", names_to = "CS", values_to = "r") %>%
    mutate(CS = factor(CS, c("glucose", "acetate", "succinate", "citrate", "fructose", "arabinose", "galactose", "ribose", "glycerol", "pyruvate", "malate", "fumarate"))) %>%
    filter(!is.na(Fermenter)) %>%
    ggplot() +
    geom_histogram(aes(x = r, fill = Fermenter), color = 1, binwidth = 0.01) +
    facet_wrap(.~CS, nrow = 4, scales = "free_y") +
    theme_classic()
```


### Acids production
```{r}
isolates %>%
    select(ID, Fermenter, starts_with("X_"), starts_with("pH")) %>%
    pivot_longer(cols = c(-Fermenter, -ID), names_prefix = "X_", names_sep = "_", names_to = c("CS", "Time"), values_to = "Amount") %>%
    mutate(Time = sub("hr", "", Time)) %>%
    mutate(CS = factor(CS, c("pH", "acetate", "lactate", "succinate"))) %>%
    filter(!is.na(Fermenter)) %>%
    ggplot() +
    geom_line(aes(x = Time, y = Amount, group = ID, color = Fermenter)) +
    facet_wrap(.~CS, nrow = 2, scales = "free_y") +
    theme_classic()
```
















